An example of forecasting time series data with multiple seasonal patterns and trend with linear regression. Using the demand for electricity in the UK dataset from the UKgrid package

Loading the packages

library(TSstudio)
library(UKgrid)
library(magrittr)
library(lubridate)
library(forecast)

Loading the data

Loading the full series

df1 <- extract_grid(type = "data.frame", # Set the output format
                   columns = "ND", # Select the Net Demand for elctricity column
                   aggregate = "daily") # Set the agrregation level

head(df1)
##    TIMESTAMP      ND
## 1 2011-01-01 1671744
## 2 2011-01-02 1760123
## 3 2011-01-03 1878748
## 4 2011-01-04 2076052
## 5 2011-01-05 2103866
## 6 2011-01-06 2135202

Converting the object to a time series format

ts.obj <- ts(df1$ND, start = c(2011, 1), frequency = 365)

Descriptive Analysis

Plot the series

ts_plot(ts.obj = ts.obj, 
        title = "The Demand for Electricity in the UK",
        Ytitle = "MW")

Seasonal analysis

Different plots for seasonality analysis

ts_seasonal(ts.obj = ts.obj)
ts_heatmap(ts.obj = df1)
ts_decompose(ts.obj = ts.obj)
ts_quantile(df1, period = "weekdays")
ts_quantile(df1, period = "monthly", n = 2)

There is a clear evidence for multiple seasonality patterns in the series:

  • Day of the week
  • Month of the year

Creating new features

Trend component - Generally the tslm function will create this functionality

df1$index <- 1:nrow(df1)

Seasonal component

df1$wday <- wday(df1$TIMESTAMP, label = TRUE) %>% factor(ordered = FALSE)
df1$month <- month(df1$TIMESTAMP, label = TRUE) %>% factor(ordered = FALSE)

head(df1)
##    TIMESTAMP      ND index wday month
## 1 2011-01-01 1671744     1  Sat   Jan
## 2 2011-01-02 1760123     2  Sun   Jan
## 3 2011-01-03 1878748     3  Mon   Jan
## 4 2011-01-04 2076052     4  Tue   Jan
## 5 2011-01-05 2103866     5  Wed   Jan
## 6 2011-01-06 2135202     6  Thu   Jan

Training approach

h = 365 # Set the forecast horizon

Create a data frame with the input values of the actual forecast, must have the same features we are using to train the model. In this case all the inputs are deterministic (day of the week, month, etc.), if using non-deterministic (such as temperature, prices, etc.) inputs, you will have to forecast their future values

# Create a data frame with futures dates
# using the last date of the actual data + one day
fc_df <- data.frame(date = seq.Date(from = max(df1$TIMESTAMP) + days(1), by = "days", length.out = h))

# Continue the index input
fc_df$index <- seq((max(df1$index) + 1), by = 1, length.out = h)

# Day of the week variable
fc_df$wday <- wday(fc_df$date, label = TRUE) %>% factor(ordered = FALSE)

# Month of the week variable
fc_df$month <- month(fc_df$date, label = TRUE) %>% factor(ordered = FALSE)

# Comparing between the end of the actual data frame and the beginning of the forecast data frame
tail(df1)
##       TIMESTAMP      ND index wday month
## 2900 2018-12-09 1454210  2900  Sun   Dec
## 2901 2018-12-10 1756626  2901  Mon   Dec
## 2902 2018-12-11 1777229  2902  Tue   Dec
## 2903 2018-12-12 1712503  2903  Wed   Dec
## 2904 2018-12-13 1731810  2904  Thu   Dec
## 2905 2018-12-14 1810382  2905  Fri   Dec
head(fc_df)
##         date index wday month
## 1 2018-12-15  2906  Sat   Dec
## 2 2018-12-16  2907  Sun   Dec
## 3 2018-12-17  2908  Mon   Dec
## 4 2018-12-18  2909  Tue   Dec
## 5 2018-12-19  2910  Wed   Dec
## 6 2018-12-20  2911  Thu   Dec

Creating a training and testing partitions

ts_par <- ts_split(ts.obj = ts.obj, sample.out = h)

train <- ts_par$train
test <- ts_par$test

train_df <- df1[1:length(train),]
test_df <- df1[(length(train) + 1):nrow(df1),]

nrow(train_df) == length(train)
## [1] TRUE
nrow(test_df) == length(test)
## [1] TRUE

Baseline model

Using the built-in functionality of the tslm function using trend and seasonal component

Note that by default, the tslm function define the season by the frequency of the series, which in this case is the day of the year (create categorical variable with 365 levels)

md1 <- tslm(train ~ trend + season)
fc1 <- forecast(md1, h = h)


test_forecast(actual = ts.obj, 
              forecast.obj = fc1,
              test = test)
accuracy(fc1, test)
##                        ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 7.593520e-13 115565.9 95703.52 -0.5631289 6.238168 0.8013792
## Test set     3.026682e+04 117086.9 95400.12  1.4226202 6.579675 0.7988387
##                   ACF1 Theil's U
## Training set 0.5148323        NA
## Test set     0.5290950 0.9851808

With MAPE Error rate of 6.23% and 6.57% on the training and testing partitions, respectively

Add the day of the week effect

md2 <- tslm(train ~ trend + season + wday , data = train_df)
fc2 <- forecast(md2, h = h, newdata = test_df)


test_forecast(actual = ts.obj, 
              forecast.obj = fc2,
              test = test)
accuracy(fc2, test)
##                         ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -2.769075e-12 64590.90 47940.93 -0.1616403 3.001469 0.4014363
## Test set      3.048622e+04 89925.85 70838.43  1.8454164 4.767082 0.5931699
##                   ACF1 Theil's U
## Training set 0.7422595        NA
## Test set     0.6524709 0.7364615

Doing a better job with 3.00% and 4.76% error rate on the training and testing partitions, respectively. We will use this approach to create the final forecast

Finalizing the forecast

Retrain the model with the data and forecast the following 365 days

final_md <- tslm(ts.obj ~ trend + season + wday, data = df1)

final_fc <- forecast(final_md, h = h, newdata = fc_df)

plot_forecast(final_fc)